{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rFiCyWQ-NC5D"
   },
   "source": [
    "# Ungraded Lab: Multiple LSTMs\n",
    "\n",
    "In this lab, you will look at how to build a model with multiple LSTM layers. Since you know the preceding steps already (e.g. downloading datasets, preparing the data, etc.), we won't expound on it anymore so you can just focus on the model building code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KeGliUKn-44h"
   },
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7mnIF-4FnzmG"
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import tensorflow_datasets as tfds\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import keras_nlp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xqmDNHeByJqr"
   },
   "source": [
    "## Load and Prepare the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "AW-4Vo4TMUHb"
   },
   "outputs": [],
   "source": [
    "# The dataset is already downloaded for you. For downloading you can use the code below.\n",
    "imdb = tfds.load(\"imdb_reviews\", as_supervised=True, data_dir=\"../data/\", download=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "UpAGm8UQnZdV"
   },
   "outputs": [],
   "source": [
    "# Extract the train reviews and labels\n",
    "train_reviews = imdb['train'].map(lambda review, label: review)\n",
    "train_labels = imdb['train'].map(lambda review, label: label)\n",
    "\n",
    "# Extract the test reviews and labels\n",
    "test_reviews = imdb['test'].map(lambda review, label: review)\n",
    "test_labels = imdb['test'].map(lambda review, label: label)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "lvm4ZwdPndhS"
   },
   "outputs": [],
   "source": [
    "# Download the subword vocabulary (not needed in Coursera)\n",
    "# !wget -nc https://storage.googleapis.com/tensorflow-1-public/course3/imdb_vocab_subwords.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3VOghFmInfdY"
   },
   "outputs": [],
   "source": [
    "# Initialize the subword tokenizer\n",
    "subword_tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(\n",
    "    vocabulary='./imdb_vocab_subwords.txt'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fF8bUh_5Ff7y"
   },
   "source": [
    "Like the previous lab, we increased the `BATCH_SIZE` here to make the training faster. If you are doing this on your local machine and have a powerful processor, feel free to use the value used in the lecture (i.e. 64) to get the same results as Laurence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ffvRUI0_McDS"
   },
   "outputs": [],
   "source": [
    "# Data pipeline and padding parameters\n",
    "SHUFFLE_BUFFER_SIZE = 10000\n",
    "PREFETCH_BUFFER_SIZE = tf.data.AUTOTUNE\n",
    "BATCH_SIZE = 256\n",
    "PADDING_TYPE = 'pre'\n",
    "TRUNC_TYPE = 'post'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "R1DVetUinjks"
   },
   "outputs": [],
   "source": [
    "def padding_func(sequences):\n",
    "  '''Generates padded sequences from a tf.data.Dataset'''\n",
    "\n",
    "  # Put all elements in a single ragged batch\n",
    "  sequences = sequences.ragged_batch(batch_size=sequences.cardinality())\n",
    "\n",
    "  # Output a tensor from the single batch\n",
    "  sequences = sequences.get_single_element()\n",
    "\n",
    "  # Pad the sequences\n",
    "  padded_sequences = tf.keras.utils.pad_sequences(sequences.numpy(), \n",
    "                                                  truncating=TRUNC_TYPE, \n",
    "                                                  padding=PADDING_TYPE\n",
    "                                                 )\n",
    "\n",
    "  # Convert back to a tf.data.Dataset\n",
    "  padded_sequences = tf.data.Dataset.from_tensor_slices(padded_sequences)\n",
    "\n",
    "  return padded_sequences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NG3unVVFnnJ8"
   },
   "outputs": [],
   "source": [
    "# Generate integer sequences using the subword tokenizer\n",
    "train_sequences_subword = train_reviews.map(lambda review: subword_tokenizer.tokenize(review)).apply(padding_func)\n",
    "test_sequences_subword = test_reviews.map(lambda review: subword_tokenizer.tokenize(review)).apply(padding_func)\n",
    "\n",
    "# Combine the integer sequence and labels\n",
    "train_dataset_vectorized = tf.data.Dataset.zip(train_sequences_subword,train_labels)\n",
    "test_dataset_vectorized = tf.data.Dataset.zip(test_sequences_subword,test_labels)\n",
    "\n",
    "# Optimize the datasets for training\n",
    "train_dataset_final = (train_dataset_vectorized\n",
    "                       .shuffle(SHUFFLE_BUFFER_SIZE)\n",
    "                       .cache()\n",
    "                       .prefetch(buffer_size=PREFETCH_BUFFER_SIZE)\n",
    "                       .batch(BATCH_SIZE)\n",
    "                       )\n",
    "\n",
    "test_dataset_final = (test_dataset_vectorized\n",
    "                      .cache()\n",
    "                      .prefetch(buffer_size=PREFETCH_BUFFER_SIZE)\n",
    "                      .batch(BATCH_SIZE)\n",
    "                      )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xcZEiG9ayNZr"
   },
   "source": [
    "## Build and Compile the Model\n",
    "\n",
    "You can build multiple layer LSTM models by simply appending another `LSTM` layer in your `Sequential` model and enabling the `return_sequences` flag to `True`. This is because an `LSTM` layer expects a sequence input so if the previous layer is also an LSTM, then it should output a sequence as well. See the code cell below that demonstrates this flag in action. You'll notice that the output dimension is in 3 dimensions `(batch_size, timesteps, features)` when `return_sequences` is True."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "18MsI2LU75kH"
   },
   "outputs": [],
   "source": [
    "# Parameters\n",
    "BATCH_SIZE = 1\n",
    "TIMESTEPS = 20\n",
    "FEATURES = 16\n",
    "LSTM_DIM = 8\n",
    "\n",
    "print(f'batch_size: {BATCH_SIZE}')\n",
    "print(f'timesteps (sequence length): {TIMESTEPS}')\n",
    "print(f'features (embedding size): {FEATURES}')\n",
    "print(f'lstm output units: {LSTM_DIM}')\n",
    "\n",
    "# Define array input with random values\n",
    "random_input = np.random.rand(BATCH_SIZE,TIMESTEPS,FEATURES)\n",
    "print(f'shape of input array: {random_input.shape}')\n",
    "\n",
    "# Define LSTM that returns a single output\n",
    "lstm = tf.keras.layers.LSTM(LSTM_DIM)\n",
    "result = lstm(random_input)\n",
    "print(f'shape of lstm output(return_sequences=False): {result.shape}')\n",
    "\n",
    "# Define LSTM that returns a sequence\n",
    "lstm_rs = tf.keras.layers.LSTM(LSTM_DIM, return_sequences=True)\n",
    "result = lstm_rs(random_input)\n",
    "print(f'shape of lstm output(return_sequences=True): {result.shape}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6Was3BX6_50C"
   },
   "source": [
    "The next cell implements the stacked LSTM architecture."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "VPNwU1SVyTjm"
   },
   "outputs": [],
   "source": [
    "# Model parameters\n",
    "EMBEDDING_DIM = 64\n",
    "LSTM1_DIM = 32\n",
    "LSTM2_DIM = 16\n",
    "DENSE_DIM = 64\n",
    "\n",
    "# Build the model\n",
    "model = tf.keras.Sequential([\n",
    "    tf.keras.Input(shape=(None,)),\n",
    "    tf.keras.layers.Embedding(subword_tokenizer.vocabulary_size(), EMBEDDING_DIM),\n",
    "    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, return_sequences=True)),\n",
    "    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM)),\n",
    "    tf.keras.layers.Dense(DENSE_DIM, activation='relu'),\n",
    "    tf.keras.layers.Dense(1, activation='sigmoid')\n",
    "])\n",
    "\n",
    "# Print the model summary\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Uip7QOVzMoMq"
   },
   "outputs": [],
   "source": [
    "# Set the training parameters\n",
    "model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uh39GlZP79DY"
   },
   "source": [
    "## Train the Model\n",
    "\n",
    "The additional LSTM layer will lengthen the training time compared to the previous lab. Given the default parameters, it will take around 2 minutes per epoch in your lab environment. Also, since this is a larger model, it might start to overfit quickly so you may want to use fewer epochs or use a callback to monitor the validation accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7mlgzaRDMtF6"
   },
   "outputs": [],
   "source": [
    "NUM_EPOCHS = 5\n",
    "\n",
    "# Train the model\n",
    "history = model.fit(train_dataset_final, epochs=NUM_EPOCHS, validation_data=test_dataset_final)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Mp1Z7P9pYRSK"
   },
   "outputs": [],
   "source": [
    "def plot_loss_acc(history):\n",
    "  '''Plots the training and validation loss and accuracy from a history object'''\n",
    "  acc = history.history['accuracy']\n",
    "  val_acc = history.history['val_accuracy']\n",
    "  loss = history.history['loss']\n",
    "  val_loss = history.history['val_loss']\n",
    "\n",
    "  epochs = range(len(acc))\n",
    "\n",
    "  fig, ax = plt.subplots(1,2, figsize=(12, 6))\n",
    "  ax[0].plot(epochs, acc, 'bo', label='Training accuracy')\n",
    "  ax[0].plot(epochs, val_acc, 'b', label='Validation accuracy')\n",
    "  ax[0].set_title('Training and validation accuracy')\n",
    "  ax[0].set_xlabel('epochs')\n",
    "  ax[0].set_ylabel('accuracy')\n",
    "  ax[0].legend()\n",
    "\n",
    "  ax[1].plot(epochs, loss, 'bo', label='Training Loss')\n",
    "  ax[1].plot(epochs, val_loss, 'b', label='Validation Loss')\n",
    "  ax[1].set_title('Training and validation loss')\n",
    "  ax[1].set_xlabel('epochs')\n",
    "  ax[1].set_ylabel('loss')\n",
    "  ax[1].legend()\n",
    "\n",
    "  plt.show()\n",
    "\n",
    "plot_loss_acc(history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "txQdN63vBlTK"
   },
   "source": [
    "## Wrap Up\n",
    "\n",
    "This lab showed how you can build deep networks by stacking LSTM layers. In the next labs, you will continue exploring other architectures you can use to implement your sentiment classification model. \n",
    "\n",
    "As before, run the cell below to free up resources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Shutdown the kernel to free up resources. \n",
    "# Note: You can expect a pop-up when you run this cell. You can safely ignore that and just press `Ok`.\n",
    "\n",
    "from IPython import get_ipython\n",
    "\n",
    "k = get_ipython().kernel\n",
    "\n",
    "k.do_shutdown(restart=False)"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "name": "C3_W3_Lab_2_multiple_layer_LSTM.ipynb",
   "private_outputs": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
